Detecting and labeling speakers on overlapping speech using vector taylor series

نویسندگان

  • Pranay Dighe
  • Marc Ferras
  • Hervé Bourlard
چکیده

Successfully modeling overlapping speech is a crucial step towards improving the performance of current speaker diarization systems. In this direction, we present ongoing work on a novel Multi-Class Vector Taylor Series (MC-VTS) approach that models overlapping speech from knowledge of the individual speaker models and the feature extraction process. We explore several variants of the MC-VTS technique that aim at modeling overlapping speech more precisely. Bootstrapping the algorithm with both oracle and diarization output segmentations, we show the potential of this approach in terms of overlapping speech detection and speaker labeling performances through a set of experiments on far-field microphone meeting data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Overlapping Speech using Vector Taylor Series

Current speaker diarization systems typically fail to successfully assign multiple speakers speaking simultaneously. According to previous studies, overlapping errors account for a large proportion of the total errors in multi-party speech diarization. In this work, we propose a new approach using Vector Taylor Series (VTS) to obtain overlapping speech models assuming individual speaker models ...

متن کامل

Detection of Overlapping Speech in Meetings Using Support Vector Regression

A method of detecting overlapping speech in meetings is proposed in this paper. The eigenvalue distribution of the spatial correlation matrix reflects information on the relative power of sound sources. By applying Support Vector Regression to a set of input eigenvalues, the relative power of sources is estimated. Based on this, overlapping speech is then detected. The proposed method was evalu...

متن کامل

An Acoustic Study of Emotivity-Prosody Interface in Persian Speech Using the Tilt Model

This paper aims to explore some acoustic properties (i.e. duration and pitch amplitude of speech) associated with three different emotions: anger, sadness and joy against neutrality as a reference point, all being intentionally expressed by six Persian speakers. The primary purpose of this study is to find out if there is any correspondence between the given emotions and prosody patterning in P...

متن کامل

Acoustic Analysis of Whispered Speech for Phoneme and Speaker Dependency

Whisper is used by speakers in certain circumstances to protect personal information. Due to the differences in production mechanisms between neutral and whispered speech, there are considerable differences between the spectral structure of neutral and whispered speech, such as formant shifts and shifts in spectral slope. This study analyzes the dependency of these differences on speakers and p...

متن کامل

Detecting overlapping speech with long short-term memory recurrent neural networks

Detecting segments of overlapping speech (when two or more speakers are active at the same time) is a challenging problem. Previously, mostly HMM-based systems have been used for overlap detection, employing various different audio features. In this work, we propose a novel overlap detection system using Long Short-Term Memory (LSTM) recurrent neural networks. LSTMs are used to generate framewi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014